Printer Forensics using SVM Techniques
نویسندگان
چکیده
In today's digital world securing different forms of content is very important in terms of protecting copyright and verifying authenticity. We have previously described the use of image texture analysis to identify the printer used to print a document. In particular we described a set of features that can be used to provide forensic information describing a document. In this paper we will introduce a printer identification process that uses a support vector machine classifier. We will also examine the effect of font size, font type, paper type, and "printer age". Introduction In today's digital world securing different forms of content is very important in terms of protecting copyright and verifying authenticity. [1,2] One example is watermarking of digital audio and images. We believe that a marking scheme analogous to digital watermarking but for documents is very important.[1] Printed material is a direct accessory to many criminal and terrorist acts. Examples include forgery or alteration of documents used for purposes of identity, security, or recording transactions. In addition, printed material may be used in the course of conducting illicit or terrorist activities. In both cases, the ability to identify the device or type of device used to print the material in question would provide a valuable aid for law enforcement and intelligence agencies. We also believe that average users need to be able to print secure documents, for example boarding passes and bank transactions. There currently exist techniques to secure documents such as bank notes using paper watermarks, security fibers, holograms, or special inks.[3] The problem is that the use of these security techniques can be cost prohibitive. Most of these techniques either require special equipment to embed the security features, or are simply too expensive for an average consumer. Additionally, there are a number of applications in which it is desirable to be able to identify the technology, manufacturer, model, or even specific unit that was used to print a given document. We propose to develop two strategies for printer identification based on examining a printed document. The first strategy is passive. It involves characterizing the printer by finding intrinsic features in the printed document that are characteristic of that particular printer, model, or manufacturer's products. We shall refer to this as the intrinsic signature. The intrinsic signature requires an understanding and modeling of the printer mechanism, and the development of analysis tools for the detection of the signature in a printed page with arbitrary content. The second strategy is active. We embed an extrinsic signature in a printed page. This signature is generated by modulating the process parameters in the printer mechanism to encode identifying information such as the printer serial number and date of printing. To detect the extrinsic signature we use the tools developed for intrinsic signature detection. We have successfully been able to embed information into a document with electrophotographic (EP) printers by modulating an intrinsic feature known as banding. This work is discussed in [4]. We have previously reported techniques that use the print quality defect known as banding in electrophotographic (EP) printers as an intrinsic signature to identify the model and manufacturer of the printer.[5,6] However, it is difficult to detect the banding signal in text. One solution which we have reported in [7] is to model the print quality defects as a texture in the printed areas of the document. To classify the document we used grayscale co-occurrence texture features. These features can be measured over small regions of the document such as individual text characters. Using these features we demonstrated the ability to process a page of printed text and correctly identify the printer that created it. In our prior work, we did not account for several variables in our printer identification process. The type of paper, font type, font size, printer age, and other variables can affect the performance of our proposed classifier. We will examine the effects of these variables in this paper. We will also introduce a modified system using a support vector machine (SVM) classifier which provides better generalization than the nearest neighbor classifier previously used. This research was supported by a grant from the National Science Foundation, under Award Number 0219893. Address all correspondence to E. J. Delp at [email protected] Table 1: Percent correct classification for varying font type Manufacturer Model DPI Hewlett-Packard LaserJet 5M 600 Hewlett-Packard LaserJet 6MP 600 Hewlett-Packard LaserJet 1000 600 Hewlett-Packard LaserJet 1200 600
منابع مشابه
Survey of Scanner and Printer Forensics at Purdue University
This paper describes methods for forensic characterization of scanners and printers. This is important in verifying the trust and authenticity of data and the device that created it. An overview of current forensic methods, along with current improvements of these methods is presented. Near-perfect identification of source scanner and printer is shown to be possible using these techniques.
متن کاملA Novel Multi-size Block Benford's Law Scheme for Printer Identification
Identifying the originating device for a given media, i.e. the type, brand, model and other characteristics of the device, is currently one of the important fields of digital forensics. This paper proposes a forensic technique based on the Benford’s law to identify the printer’s brand and model from the printed-and-scanned images at which the first digit probability distribution of multi-size b...
متن کاملFile-type Identification with Incomplete Information
File-type Identification (FTI) is an important problem in digital forensics, intrusion detection, and other related fields. Using stateof-the-art classification techniques to solve FTI problems has begun to receive research attention; however, general conclusions have not been reached due to the lack of thorough evaluations for method comparison. This paper presents a systematic investigation o...
متن کاملTexture based attacks on intrinsic signature based printer identification
Several methods exist for printer identification from a printed document. We have developed a system that performs printer identification using intrinsic signatures of the printers. Because an intrinsic signature is tied directly to the electromechanical properties of the printer, it is difficult to forge or remove. There are many instances where existance of the intrinsic signature in the prin...
متن کاملComparative study of Authorship Identification Techniques for Cyber Forensics Analysis
Authorship Identification techniques are used to identify the most appropriate author from group of potential suspects of online messages and find evidences to support the conclusion. Cybercriminals make misuse of online communication for sending blackmail or a spam email and then attempt to hide their true identities to void detection.Authorship Identification of online messages is the contemp...
متن کامل